shapleyvic value
Shapley variable importance cloud for machine learning models
Ning, Yilin, Liu, Mingxuan, Liu, Nan
Current practice in interpretable machine learning often focuses on explaining the final model trained from data, e.g., by using the Shapley additive explanations (SHAP) method. The recently developed Shapley variable importance cloud (ShapleyVIC) extends the current practice to a group of "nearly optimal models" to provide comprehensive and robust variable importance assessments, with estimated uncertainty intervals for a more complete understanding of variable contributions to predictions. ShapleyVIC was initially developed for applications with traditional regression models, and the benefits of ShapleyVIC inference have been demonstrated in real-life prediction tasks using the logistic regression model. However, as a model-agnostic approach, ShapleyVIC application is not limited to such scenarios. In this work, we extend ShapleyVIC implementation for machine learning models to enable wider applications, and propose it as a useful complement to the current SHAP analysis to enable more trustworthy applications of these black-box models.
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.69)
A novel interpretable machine learning system to generate clinical risk scores: An application for predicting early mortality or unplanned readmission in a retrospective cohort study
Ning, Yilin, Li, Siqi, Ong, Marcus Eng Hock, Xie, Feng, Chakraborty, Bibhas, Ting, Daniel Shu Wei, Liu, Nan
Risk scores are widely used for clinical decision making and commonly generated from logistic regression models. Machine-learning-based methods may work well for identifying important predictors, but such 'black box' variable selection limits interpretability, and variable importance evaluated from a single model can be biased. We propose a robust and interpretable variable selection approach using the recently developed Shapley variable importance cloud (ShapleyVIC) that accounts for variability across models. Our approach evaluates and visualizes overall variable contributions for in-depth inference and transparent variable selection, and filters out non-significant contributors to simplify model building steps. We derive an ensemble variable ranking from variable contributions, which is easily integrated with an automated and modularized risk score generator, AutoScore, for convenient implementation. In a study of early death or unplanned readmission, ShapleyVIC selected 6 of 41 candidate variables to create a well-performing model, which had similar performance to a 16-variable model from machine-learning-based ranking.
- Asia > Singapore > Central Region > Singapore (0.05)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- Asia > Taiwan (0.04)